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A theory of symbolic dynamic systems with long-range correlations based on the consideration 
of the binary iV-step Markov chains developed earlier in Phys. Rev. Lett. 90, 110601 (2003) 
is generalized to the biased case (non equal numbers of zeros and unities in the chain). In the 
model, the conditional probability that the i-th symbol in the chain equals zero (or unity) is a linear 
function of the number of unities (zeros) among the preceding N symbols. The correlation and 
distribution functions as well as the variance of number of symbols in the words of arbitrary length 
L are obtained analytically and verified by numerical simulations. A self-similarity of the studied 
stochastic process is revealed and the similarity group transformation of the chain parameters is 
presented. The diffusion Fokker-Planck equation governing the distribution function of the L-words 
is explored. If the persistent correlations are not extremely strong, the distribution function is shown 
to be the Gaussian with the variance being nonlinearly dependent on L. An equation connecting the 
memory and correlation function of the additive Markov chain is presented. This equation allows 
reconstructing a memory function using a correlation function of the system. Effectiveness and 
robustness of the proposed method is demonstrated by simple model examples. Memory functions 
of concrete coarse-grained literary texts are found and their universal power-law behavior at long 
distances is revealed. 

PACS numbers: 05.40.-a, 02.50.Ga, 87.10. +c 

INTRODUCTION 

The problem of systems with long-range spatial and/or temporal correlations (LRCS) is one of the topics of intensive 
research in modern physics, as well as in the theory of dynamic systems and the theory of probability. The LRC systems 
are usually characterized by a complex structure and contain a number of hierarchic objects as their subsystems. 
The LRCS are the subject of study in physics, biology, economics, linguistics, sociology, geography, psychology, 

etc. uB0.ni. 

One of the efficient methods to investigate the correlated systems is based on a decomposition of the space of 
states into a finite number of parts labeled by definite symbols. This procedure referred to as coarse graining can be 
accompanied by the loss of short-range memory between states of system but does not affect and does not damage 
its robust invariant statistical properties on the large scales. The most frequently used method of the decomposition 
is based on the introduction of two parts of the phase space. In other words, it consists in mapping the two parts of 
states onto two symbols, say and 1. Thus, the problem is reduced to investigating the statistical properties of the 
symbolic binary sequences. This method is applicable for the examination of both discrete and continuous systems. 

One of the ways to get a correct insight into the nature of correlations consists in an ability of constructing a 
mathematical object (for example, a correlated sequence of symbols) possessing the same statistical properties as 
the initial system. There are many algorithms to generate long-range correlated sequences: the inverse Fourier 
transform || , the expansion-modification Li method |(J , the Voss procedure of consequent random addition , the 
correlated Levy walks etc. We believe that, among the above-mentioned methods, using the Markov chains is 
one of the most important. This was demonstrated in Ref. 0, where the Markov chains with the step-like memory 
function (MF) were studied, ft was shown that there exist some dynamical systems (coarse-grained sequences of the 
Eukarya's DNA and dictionaries) with correlation properties that can be properly described by this model. 

The many-step Markov chain is the sequence of symbols of some alphabet constructed using a conditional probability 
function, which determines the probability of occurring some definite symbol of sequence depending on TV previous 
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ones. The property of additivity of Markov chain means the independent influence of different previous symbols on 
generated one. The concept of additivity, primarily introduced in paper [lOj, was later generalized for the case of 
binary non- stationary Markov chains |l3j . Another generalization was based on consideration of Markov sequences 
with a many- valued alphabet 0, 0] . Here we generalize the results of paper Q to the biased case where the numbers 
of zeros and unities are not supposed to be equal. 

In the present work, we also continue investigating into additive Markov chains with more complex memory func- 
tions. An equation connecting mutually-complementary characteristics of a random sequence, i.e. the memory and 
correlation functions, is obtained. Upon finding the memory function of the original random sequence on the basis 
of the analysis of its statistical properties, namely, its correlation function, we can build the corresponding Markov 
chain, which possesses the same statistical properties as the initial sequence. 

FORMULATION OF THE PROBLEM 
Conditional probability of the many-step additive Markov chain 

Let us consider a stationary binary sequence of symbols <Zj, a, = {0, 1}, i 6 Z = —1, —2, 0, 1, 2, .... To determine 
the N-step Markov chain we have to introduce the conditional probability P{ai \ a^N, a,_jv+i, . . . , aj_i) of occurring 
the definite symbol a; (for example, ai = 1) after TV-word Tjv,j, where Tjv,j stands for the sequence of symbols 
a,-_jV) cti-jv+ii ■ • • jOi— i- Thus, it is necessary to define 2 N values of the P-function corresponding to each possible 
configuration of the symbols a, in the TV- word Tjv.j. 

We suppose that the conditional probability P(<Xj | TV,i) differs from zero and unity for any word T;v,j that provides 
the metrical transitivity of the Markov chain (see Appendix). In turn, according the Markov theorem, this property 
leads to the ergodicity of the symbolic system under consideration. 

Since we suppose to apply our theory to the sequences with long memory lengths of the order of 10 6 , some special 
restrictions to the class of P-functions should be imposed. We consider the memory function of the additive form, 

1 N 

P(a i = l\T Nti ) = -^2f(a i - r ,r). (1) 

r=l 

Here the function /(a,_fc, k) /N describes the additive contribution of the symbol ai- r to the conditional probability of 
occurring the symbol unity, = 1, at the ith site. The homogeneity of the Markov chain is provided by independence 
of the conditional probability Eq. of the index i. It is possible to consider Eq. (JTJ as the first term in expansion 
of conditional probability in the formal series, where each term corresponds to the additive (unary), binary, ternary, 
and so on functions up to the A^-ary one. 

It is reasonable to assume the function / to be decreasing with an increase of the distance r between the symbols 
ai_ r and a, in the Markov chain. However, for the sake of simplicity we consider a step-like memory function / (aj_ r , r) 
independent of the second argument r. As a result, the model is characterized by three parameters only, specifically 
by /(0), /(l), and N: 

1 N 

P(a i = l\T N J = -J2f(*i-r)- (2) 

r=l 

Note that the probability P in Eq. J3J) depends on the numbers of symbols and 1 in the TV- word but is independent 
of the arrangement of the elements ai-k- Instead of two parameters f(0) and /(l) it is convenient to introduce new 
independent parameters v and fi (see below Eq. ©), 

/(0) + /(l) = 1 + 2v, H<l/2. (3) 

Parameter v provides the statistical inequality of the numbers of symbols zero and unity in the Markov chain under 
consideration. In other words, the chain is biased. Indeed, taking into account Eqs. J3J) and © and the sequence of 
equations, 

1 N 

P{a t = l\T Nli ) = j^J2 - 2v = P ( fl * = I f N,i) ~ 2v, (4) 

r=l 
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one can see the lack of symmetry with respect to interchange <Zj <-> a, in the Markov chain if v ^ 0. Here 5j is the 
symbol "opposite" to a,, a, = 1 — Oj, and Tjv,i is the word "opposite" to Tjv,<. Therefore, the probabilities of occurring 
the words Tj,^ and Tj,^ are not equal to each other for any word of the length L. At L = 1 this yields nonequal 
average probabilities that symbols and 1 occur in the chain. Particularly probability of occurring symbol is grater 
by 2v than that of symbol 1. If v = one has non-biased case. 

Taking into account the symmetry of the conditional probability P with respect to a permutation of symbols <Zj 
(see Eq. Q), we can simplify the notations and introduce the conditional probability pk of occurring the symbol zero 
after the TV-word containing k unities, e.g., after the word (11.. .1 00^0), 

Pk = P(a N+1 = | U^_l OCK^O) 

fc N-k 



= 5 + " + ^-f). 0» 

with the correlation parameter /x being defined by the relation 

/(0)-/(l) f(n , 1 rfi s 

We focus mainly our attention on the region of n determined by the persistence inequality < fx. In this case, each 
of the symbols unity in the preceding TV-word promotes the birth of new symbol unity. Nevertheless, the major part 
of our results is valid for the anti-persistent region /i < as well. Note that inequalities \v\ < 1/2 and /i + v\ < 1/2 
follow from Eq. J5j). Without loss of generality, we consider a case v > only. 



Statistical characteristics of the chain 



In order to investigate the statistical properties of the Markov chain, we consider the distribution Wi(fc) of the 
words of definite length L by the number k of unities in them, 

L 

h(L) =^2a l+ i, (7) 
i=i 

and the variance of k, 

D(L)=k^-k 2 , (8) 

where 

L 

W) = Y.9( k WL{k). (9) 

fc=0 

If /i = 0, one arrives at the known result for the non-correlated Brownian diffusion, 

D(L) = L^-^y (10) 

We will show that the distribution function W^k) for the sequence determined by Eq. JSJ (with nonzero but not 
extremely close to 1/2 — v parameter \i) is the Gaussian with the variance D(L) nonlinearly dependent on L. However, 
at fi — > 1/2 — v the distribution function can differ strongly from the Gaussian. 



Main equation 



For the stationary Markov chain, the probability b(a\ai . . . ajv) of occurring a certain word (oj, 02, ... , a^) satisfies 
the condition of compatibility for the Chapman-Kolmogorov equation (see, for example, Ref. 16] ): 



6(01 . . .a N ) = 



4 



^ b(aai . . . a N ^ 1 )P(a N \ a, 01, . . . , ojv-i). 



(11) 



a=0,l 



Thus, we have 2 N homogeneous algebraic equations for the 2 N probabilities 6 of occurring the iV-words and the 
normalization equation Y]b = 1. This set of equations is equivalent to that of Eq. (|TT|) . In the case under consideration, 
the set of Eqs. can be substantially simplified owing to the following statement: 

Proposition 1: The probability b{a\a2 ■ ■ ■ a/v) depends on the number k of unities in the N-word only, i. e., it is 
independent of the arrangement of symbols in the word (<zi, <Z2, ■ ■ . , ajv)- 
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FIG. 1: The probability b of occurring a word (ai, 02, ... , ajv) vs its number z expressed in the binary code, z = X^i-i ai ' ^ 
for N = 8, fx = 0.1, v = 0.03. 



This statement illustrated by Fig. 1 is valid owing to the chosen simple model (J5J of the Markov chain. It 
can be easily verified directly by substituting the obtained below soluti on Ml 51) into the set of Eqs. (|11|) . Note that 
according to the Markov theorem, Eqs. do not have other solutions |l7|. 

Proposition 1 evidently leads to the very important property of isotropy: any word (ai,a2, . ..,Ol) appears with 
the same probability as the inverted one, (ol, <i.l— i> ■ ■ ■ 7 a i)- 

Let us apply the set of Eqs. (fTTf) to the word (11 . \ 00 . 0) : 

k N-k 

6(11 ... 1 00 ... 0) = 6(0 11 ... 1 00 . . . 0)p k + 



N-k 



N-k-1 



+ 6(1 1^1 00_^_0)p fe+ i. 

k N-k-1 

This yields the recursion relation for b(k) = 6(FL^1 Ofb^O), 

k N-k 



6(fc) 



1 - Pk- 



pk 



-b(k-l) 



(12) 



N - 2vN - 2n(N -2k + 2) 
~ N + 2vN + 2fi{N - 2k) 

The probabilities b(k) for /i > satisfy the sequence of inequalities, 



b(k-l). 



(13) 



M!K))<<fK,] + 'i — 



(14) 
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which is the reflection of persistent properties for the chain. 
The solution of Eq. ijTTJl is 

b(k) = A • r(m + k)T(n 2 + N - k) (15) 
with the parameters n\ and n 2 defined by 

n l = , n 2 = . (16) 

The constant A will be found below by normalizing the distribution function. Its value is, 

A _ r ("i + n *) ( 17 ) 

r(n 1 )r(n 2 )r(n 1 +n 2 + 7V)' V ' 



DISTRIBUTION FUNCTION OF L- WORDS 



In this section we investigate into the statistical properties of the Markov chain, specifically the distribution of the 
words of definite length L by the number k of unities. The length L can also be interpreted as the number of jumps 
of some particle over an integer-valued ID lattice or as the time of the diffusion imposed by the Markov chain under 
consideration. The form of the distribution function Wi,(k) depends, to a large extent, on the relation between the 
word length L and the memory length N. Therefore, the first thing we will do is to examine the simplest case L = N. 



Statistics of TV-words 



The value b(k) is the probability that an TV- word contains k unities with a definite order of symbols a,. Therefore, 
the probability Wj^{k) that an iV-word contains k unities with arbitrary order of symbols a.j is b{k) multiplied by the 
number C^, = N\/k\(N — k)l of different permutations of k unities in the TV-word, 

W N {k) = C k N b(k). (18) 

Combining Eqs. (|15fl and i|18|) . we find the distribution function, 

WW*) - W N (0)C% T -^§^±^> . (19) 
r(m)r(n2 + N) 

N 

The normalization constant Wn(0) can be obtained from the equality ^N(k) = 1, 

fe=o 

HV ,,o) . ,20, 

r(n 2 )r(ni + n 2 + N) 

Comparing Eqs. lfT5|) . iJTSJl-ldDll; one can get Eq. l(T7|) for the constant A in Eq. lfT5|) . 



Limiting case of weak persistence, n\,ni 3> 1 



In terms of the correlation parameter /i, this limiting case corresponds to the values of not very close to 1/2, 

1^E±A » 1. (21) 

This inequality can be rewritten via the /-function (see Eqs. © — iJBJ), 

»i (22, 



/(0)-/(l) W 
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In the absence of correlations, n 2 — > oo, Eq. I|19|) and the Stirling formula yield the Gaussian distribution at 
k, Nni/ri2, N — k ^> 1, k — ko <C N. Given the persistence is not too strong, 

n 2 > 1, (23) 

one can also obtain the Gaussian form for the distribution function, 

W N (k) = _4=cxp f-felf ) , (24) 



with the /i-dependent variance, 



D(N) 



N(N + rii + 712)^1^2 
(ni + n 2 ) 3 



N 



4(1 - 2 M ) 



1 - 



4// 



2 



(1 - 2 M )2 



7 ni Ar TV 

fc = ; AT : 



ni + n 2 2(1 - 2jj) 



2v 



1 - 2/i 



(25) 



(26) 



It is followed from Eq. (|24J) that TV-words containing ko unties are the most probable. It is interesting to note, that 
the persistence leads to a decrease of the variance D(N, \i > 0) with respect to D(N, /i = 0) = AM 1/4 — v 2 ) if 

1-2 M 
2^3-6^ + 4/i 2 

In other case, for instance, at v — 0, the persistence results in an increase of the variance D(N, fi). To put it differently, 
the persistence is conductive to the intensification of the diffusion under conditions opposite to inequality (|27l) . 

Inequality n 2 ^> 1 gives D(N) <C iV 2 . Therefore, despite the increase of D(N), the fluctuations of (k — k ) of the 
order of N are exponentially small. 

Intermediate case, ri2 > 1 

If the parameters n\ and n 2 are integers of the order of unity, the distribution function Wjsi{k) is a polynomial of 
degree n\ + n 2 — 2. In particular, at ni = n 2 = 1, the function Wjq{k) is constant, 

W N {k) = (28) 

At ri\ 7^ 1, W^ik) has a maximum within the interval [0, N}. At n\ = 1 and n 2 > 1, Wnik) decreases monotonously 
with an increase of k. 

Limiting case of strong persistence 
If the parameter n 2 satisfies the inequality, 

n 2 < In -1 N, (29) 



l-2(n-v) <§H/Nln{N), /(l) <C l/A r ln(A r ), (30) 

then one can neglect the parameters ni and tt-2 in the arguments of the functions r(ni + fc), T(n 2 +N), and r(n 2 +Af— fc) 
in Eq. (|19fl . In this case, the distribution function Wjv(fc) assumes its maximal values at k = and k = N, 

Til N 

W N {\) = W N {0)j^—j « WOv(0). (31) 
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Formula l|31|) describes the sharply decreasing W^ik) as k varies from to 1 (and from N to N — 1). Then, at 
1 < k < N/2, the function Wzv(fc) decreases more slowly with an increase in k, 

w ^ = w ^W^k Y ^ 

At k = N/2, the probability WOv(fc) achieves its minimal value, 

W N (^pj=W N {0)^-. (33) 

It follows from normalization H20(l that the values Wn(0) and Wn(N) are approximatively equal to n 2 /(n\ + n 2 ) 
and n 1 /(ni + n 2 ) respectively. Neglecting the terms of the order of n 2 , one gets 

W N (0)= " 2 (1-mlniV), (34) 
ni + n 2 



W N {N) 



■(1 - n 2 ItlN). 



n\ + n 2 

In the straightforward calculation using Eqs. © and l|3^1) the variance D is 

n x n 2 N 2 nm 2 N(N - 1) 



D(N) = 



■ n 2 



ni + n 2 



(35) 



(36) 



Thus, the variance D(N) is equal to riin 2 N 2 /(rii + n 2 ) in the leading approximation in the parameter n. This fact 

has a simple explanation. The probability of occurrence the iV-word containing N unities is approximatively equal 

2 

to ni/(ni + n 2 ). So, the relations k 2 w n\N 2 /(n\ + n 2 ) and k — n\N 2 /(n\ + n 2 ) 2 give (J2EJl. The case of strong 

persistence corresponds to the so-called ballistic regime of diffusion: if we chose randomly some symbol in the 

sequence, it will be surrounded by the same symbols with the probability close to unity. 

The evolution of the distribution function Wjv(fc) from the Gaussian form to the inverse one with a decrease of 
the parameters n\ and n 2 is shown in Fig. 2. In the interval In -1 N < n 2 < 1 the curve W^ik) is concave and the 
maximum of function W^{k) inverts into minimum. At N 3> 1 and In -1 N < n 2 < 1, the curve remains a smooth 
function of its argument k as shown by curve with n = 0.5 in Fig. 2. Below, we will not consider this relatively narrow 
region of the change in the parameter n 2 . 

Formulas (|24|l . I|25|l . (|32() and (|34|l — (|36|l describe the statistical properties of L- words for the fixed "diffusion 
time" L = N. Below, we examine the distribution function WL(k) for more general situation, L < N. 
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FIG. 2: The distribution function Wjv(fc) for N—20 and different values of the parameters ni and 72,2 shown near the curves. 
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Statistics of L-words with L < N 



Distribution function Wl (k) 



The distribution function at L < N can be given as 

k+N-L 



W L (k)= J2 b ^ C L C 



i — k 
N-L- 



i—k 



This equation follows from the consideration of N- words consisting of two parts, 

(<Zi, . . . , a^-L: &N-L+1, ■ * * j o-n)- 



(37) 



(38) 



i — k unities 



k unities 



The total number of unities in this word is i. The right-hand part of the word (L-sub-word) contains k unities. The 
remaining (i — k) unities are situated within the left-hand part of the word (within [N — L)-sub-word) . The multiplier 
^l^n-l m Eq- H37|) takes into account all possible permutations of the symbols "1" within the A^-word on condition 
that the L-sub-word always contains k unities. Then we perform the summation over all possible values of the number 
i. Note that Eq. Q37JI is a direct consequence of the proposition 1 formulated in Subsec. C of the previous section. 

The straightforward summation in Eq. (|37|) yields the following formula that is valid at any value of the parameters 
n\ and n 2 : 



W L (k) = W L (0)C 



k r(m + k)T(n 2 + L-k) 

r(m)r(n 2 + £) 



where 



W L (0) 



T(n 1 +n 2 )T(n 2 + L) 
r(n 2 )T(n 1 +n 2 + L)' 



(39) 



(40) 



It is of interest to note that the parameters fi, v and the memory length N are presented in Eqs. I|39|l . I|40|l via 
the parameters n\ and n 2 only. This means that the statistical properties of the L-words with L < N are defined by 
these "combined" parameters. 

In the limiting case of weak persistence, n 2 3> 1, at k, Lni/n 2l L— k S> 1, Eq. I|39|) along with the Stirling formula 
give the Gaussian distribution function, 



W L (k) = 



1 



y/2TrD(L) 



exp 



(k - fc ) 2 
2D(L) 



(41) 



with the variance D(L) 



D(L) 



and 



n\n 2 L 
(ni + n 2 ) 2 



L 



n\ + n 2 



ni + n 2 



L 

' 4 

L 



N(l-2fi) 



2v 



1 - 2/i 



1 - 



Av 2 



(1 - 2^)2 



(42) 



(43) 



In the case of strong persistence (|2*^|) . the asymptotic expression for the distribution function Eq. (|3^|l can be written 



as 



W L (k) = W L (0) 



k{L - k) ' 



fc^O, k^L, 



(44) 



W L (0) 



r>2 



ni + n 2 



-(1-mlnL), W L {L) 



»i 



rix + n 2 



(l-n 2 InL) 



(45) 



Both the distribution Vt / i(fc) (|44|l and the function Wwik) have concave forms. The former assumes the 
maximal values (|45|) at the edges of the interval [0, L] and has a minimum at k — L/2. 
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Variance D(L) 

Using the definition Eq. JHJ and the distribution function Eq. Q39[l one can obtain a very simple formula for the 
variance D(L), 



D(L) 



Ln\n 2 
(fix + n 2 ) 2 



(L-l) 



n 2 



L 
1 



2^{L - 1) 



N -2fi{N - 1) 



Av 1 



(1 - 2 M )2 



(46) 



Eq. (|46|) shows that the variance D(L) obeys the parabolic law independently of the correlation strength in the Markov 
chain. 

In the case of weak persistence, at n 2 ^ 1, we obtain the asymptotics of Eq. I|42|) . It allows one to analyze the 
behavior of the variance D(L) with an increase in the "diffusion time" L. At small /z, the dependence D(L) follows 
the classical law of the Brownian diffusion, D(L) m L(l/A — v 2 ). 

For the case of strong persistence, n 2 <C 1, Eq. Ij4fi|l gives the asymptotics, 



D(L) = 



nin 2 L 2 
("i 



n 2 ) 



i\n 2 L{L - 1) 
ni + n 2 



(47) 



The ballistic regime of diffusion leads to the quadratic law of the D(L) dependence in the zero approximation in the 
parameter n 2 -C 1. 

The unusual behavior of the variance D(L) raises an issue as to what particular type of the diffusion equation 
corresponds to the nonlinear dependence D(L) in Eq. I|42|l . In the following subsection, when solving this problem, 
we will obtain the conditional probability p 1 -- ^ of occurring the symbol zero after a given L-word with L < N. The 
ability to find p(°\ with some reduced information about the preceding symbols being available, is very important for 
the study of the self-similarity of the Markov chain (see Subsubsec. 4 of this Subsection). 



Generalized diffusion equation at L < N , n2 ^ 1 

It is quite obvious that the distribution Wi(fe) satisfies the equation 

W L+1 {k) = W L (k)p i0 \k) + W L (k - l)p«(fc - 1). (48) 

Here p^ (k) is the probability of occurring "0" after an average-statistical L-word containing k unities and p^ (k — 1) 
is the probability of occurring "1" after an L-word containing (k — 1) unities. At L < N, the probability p(°\k) can 
be written as 

k+N-L 

p( ° )(fcH WZW £ PiKi)C k L C^ L . (49) 

The product &(i)C^ j C^_ fe i in this formula represents the conditional probability of occurring the iV-word containing 
i unities, the right-hand part of which, the L-sub-word, contains k unities (compare with Eqs. JSHJl). 

The product b{i)C l ^ L in Eq. is a sharp function of i with a maximum at some point i — io whereas pi obeys 
the linear law JSJ). This implies that pi can be factored out of the summation sign being taken at point i = io- The 
asymptotical calculation shows that point io is described by the equation, 

. _ N ( 2v \ L/2 f 2k 

l ° ~ ~2 V ~ 1 - 2/i J ~ 1 - 2/i(l - L/N) \ ~ ~L 

Expression (JSJ taken at point ia gives the desired formula for p^ because 

k+N-L 

b{i)C k L <j^ L (51) 

i—k 

is obviously equal to W^(fc). Thus, we have 



2v \ 
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(o)fi\ 1 1 1 2i/ \ //L ^ 2fc 2;/ 



Let us consider a very important point relating to Eq. (|50|l . If the concentration of unities in the right-hand part 
of the word l|38[l is higher than 1/2 — vj(\ — 2/x), k/L > 1/2 — vj{\ — 2fi), then the most probable concentration 
(io — k)/ (A — L) of unities in the left-hand part of this word is likewise increased, (io — k)/ (A — L) > 1/2 — v/(l — 2/i). 
At the same time, the concentration (io — k)/(N — L) is less than k/L, 

l -(l-^-\<^±< k -. (53) 
2\l-2fiJN-LL V ' 

This implies that the increased concentration of unities in the L-words is necessarily accompanied by the existence 
of a certain tail with an increased concentration of unities as well. Such a phenomenon is referred by us as the 
macro-persistence. An analysis performed in the following section will indicate that the correlation length l c of this 
tail is jN with 7 > 1 dependent on the parameters /i and v only. It is evident from the above-mentioned property of 
the isotropy of the Markov chain that there are two correlation tails from both sides of the L-word. 

Note that the distribution Wl (k) is a smooth function of arguments k and L near its maximum in the case of weak 
persistence and k,L — k,Lni/ri2 S> 1. By going over to the continuous limit in Eq. (|48J) and using Eq. I|52J) with 
the relation p^ x \k — 1) = 1 — p^(k — 1), we obtain the diffusion Fokker-Planck equation for the correlated Markov 
process, 



dL 8 dn 2 \ (1 - 2/i) 2 J (1 - 2(i)N + 2/iL dn 

where k = k — L/2. Equation l|54|l has a solution of the Gaussian form Eq. (|41|l with the variance D(L) satisfying the 
ordinary differential equation, 

d£>_l/ Av 2 \ 4fi 

dL ~ 4 ^ (l-2/i) 2 J + {l-2fi)N + 2/iL ' ( ' 

Its solution, given the boundary condition D(0) = 0, coincides with (|42Jl . 



Self- similarity of the persistent Brownian diffusion 

In this subsection, we point to one of the most interesting properties of the Markov chain being considered, namely, 
its self-similarity. Let us reduce the TV-step Markov sequence by regularly (or randomly) removing some symbols and 
introduce the decimation parameter A, 

A = N*/N < 1. (56) 

Here N* is a renormalized memory length for the reduced 7V*-step Markov chain. According to Eq. H52JI . the 
conditional probability p* k of occurring the symbol zero after k unities among the preceding N* symbols is described 
by the formula, 

with 



N* = AA^, V* ~V -,f Jl *=f Jl -A rr. (58) 

l-2/i(l-A) ' ' p l-2/i(l-A) ^ ; 

The comparison between Eqs. (jSJ) and l|57|l shows that the reduced chain possesses the same statistical properties 
as the initial one but it is characterized by the renormalized parameters (A*, v* , ^*) instead of (A, v, /1). Thus, 
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Eqs. Jg} and (JSHJ dete rminc the one-parametrical renormalization of the parameters of the stochastic process defined 
by Eq. ©. 

The astonishing property of the reduced sequence consists in that the variance D*(L) is invariant with respect to 
the one-parametric decimation transformation <|5ti|) . 158|) . In other words, it coincides with the function D(L) for the 
initial Markov chain: 



D*(L) = 



LnX 



K + n* 2 ) 



1 



(L-l) 



= D(L),L< N*. 



(59) 



Indeed, according to Eqs. iJSSJ, ifSSl) . the renormalized parameters n* = N*(l — 2(fi* + z/*))/4/x* and n 2 = N*(l — 
2(/^* — i/*))/4jU* of the reduced sequence coincides exactly with the parameters n\ and n 2 of the initial Markov chain. 
Since the shape of the function Wi(fc) Eq. Q39|l is defined by the invariant parameters n\ = n\ and n 2 — n 2 , the 
distribution Wi(fc) is also invariant with respect to the decimation transformation. 

The transformation (N, z/, /i) — > (N*, v* , fB^jl . (f5%)) possesses the properties of semi-group, i. e., the composition 
of transformations (N, v, y.) — > (AT*, z/*, /i*) and (N*, v*, n*) — > (A^**, z/**, ^**) with transformation parameters 
Ai and A2 is likewise the transformation from the same semi-group, (N, v, fi) — > (N**, v** , /i**), with parameter 
A = A1A2. 

The invariance of the function D(L) &t L < N was referred to by us as the phenomenon of self- similarity. It is 
demonstrated in Fig. [Sj 

It is interesting to note that the property of self-similarity is valid for any strength of the persistency. Indeed, the 
result Eq. (|52|l can be obtained directly from Eqs. (|15|) - (|17fl . and l|49|l not only for n 2 ^> 1 but also for the arbitrary 
value of n 2 . 




100 



FIG. 3: The dependence of the variance D on the tuple length L for the generated sequence with N = 100, = 0.4 and 
v — 0.08 (dotted line) and for the decimated sequences (the parameter of decimation A = 0.5). Squares and circles correspond 
to the stochastic and deterministic reduction, respectively. The solid line describes the non-correlated Brownian diffusion, 
D(L) — £(1/4 — v 2 ). 



MEMORY FUNCTION AND ITS CONNECTION WITH CORRELATION FUNCTION 



Typically, the correlation function and other moments are employed as the input characteristics for the description 
of the correlated random sequences. However, the correlation function describes not only the direct interconnection 
of the elements a, and aj+ r , but also takes into account their indirect interaction via all other intermediate elements. 
Our approach operates with the "origin" characteristics of the system, specifically, with the memory function. The 
correlation and memory functions are mutual-complementary characteristics of a random sequence in the following 
sense. The numerical analysis of a given random sequence enables one to directly determine the correlation function 
rather than the memory function. On the other hand, it is possible to construct a random sequence using the memory 
function, but not the correlation one. Therefore, we believe that the investigation of memory function of the correlated 
systems will permit one to disclose their intrinsic properties which provide the correlations between the elements. 
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The memory function used in Refs. 0,0] was characterized by the step- like behavior and defined by two parameters 
only: the memory depth N and the strength of symbol's correlations. Such a memory function describes only one 
type of correlations in a given system, the persistent or anti-persistent one, which results in the super- or sub-linear 
dependence D(L) [22^. Obviously, both types of correlations can be observed at different scales in the same system. 
Thus, one needs to use more complex memory functions for detailed description of the systems with both type of 
correlations. Besides, we have to find out a relation connecting the mutually-complementary characteristics of random 
sequence, the memory and correlation functions. 

Main equation 

Let us rewrite Eq. Q in an equivalent form, 

JV 

P( ai = 1 | T Nti ) = b + J2 F(r)(di-r ~ b), (60) 

r=l 

with 

JV 

Zf(0,r)/N 

b= r ~^ , F(r) = -[f(l,r)-f(Q,r)]. (61) 

l-£F(r) 

r=l 

The constant b is the value of averaged over the whole sequence, b = a: 

M 



a = lim , . 

M^oo 2M , .. 

i=-M 

Indeed, according to the ergodicity of the Markov chain, a coincides with the value of eij averaged over the ensemble 
of realizations of the Markov chain. So, we can write 

a = Pr( ai = 1) = = 1 I T N ,i)Pr(T N ,i). (63) 

Tjv.i 

Here Pr(ai = 1) is the probability of occurring the symbol aj equal to unity and Pr(T/v,i) is the probability of 
occurring the definite word Tn,% in the considering ensemble of sequences. Substituting P(a,i — 1 | Tjv,i) from Eq. I|60() 
into Eq. 163|l and taking into account the obvious relation £ Pr(T^ : i) — 1, one gets, 

Tjv.i 

JV JV 

5 = b - b jF ( r ) + F ^ J2 Pr( T N,i)ai- r - (64) 

r=l r=l Tjv.i 

The sum £ P r (TN,i) a i-r does not depend on the subscript r and obviously coincides with a. So, we have a = 

T N ,t 

b + (a — b)^2 F(r). From this equation we conclude that b = a. Thus, we can rewrite Eq. 1)60(1 as 

r 

JV 

P(a t = 1 | Tn.,) = a + Y, F(r)( ai - r ~ a). (65) 

r=l 

We refer to F(r) as the memory function (MF). It describes the strength of influence of previous symbol et;_ r upon 
a generated one, a^. To the best of our knowledge, the concept of memory function for many-step Markov chains was 
introduced in Ref. The function P(. | .) contains the complete information about correlation properties of the 
Markov chain. 

We suggest below two methods for finding the memory function F(r) of a random binary sequence with a known 
correlation function. The first one is based on the minimization of a "distance" Dist between the Markov chain 
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generated by means of a sought-for MF and the initial sequence of symbols. This distance is determined by the 
formula. 



! M 

DM = (a, - P(a t = 1 | T Wji )) 2 = lim V (a, - P(a* = 1 | Tjv,*)) 2 , 

M^oo ZM + 1 1 — ' 
i=-Af 

with the conditional probability P defined by Eq. (|65|) . 

Let us express distance l|66[) in terms of the correlation function, 

K{r) =WT^-a 2 , K(0)=a{l-a), K{-r) = K{r). 

From Eqs. (|65[) . (|66|) . one obtains 

Dist = (ai_ r — a)(a,_ r / — a)F(r)F(r') — 2 (<Zj — a)(ai_ r — a)F(r) + (a, — a) 2 



(66) 



(67) 



^ A'(r - r')F(r)F{r') - 2 ^ K(r)F{r) + A(0). 



(68) 



The minimization equation, 



<5F(r) 



2 ^ A(r - r')-F(r') - 2K(r) = 0, 



yields the relationship between the correlation and memory functions, 

N 



K{r) = ^F{r')K{r-r'), r > 1. 



(69) 



(70) 



Equation (|70[l can also be derived by straightforward calculation of the average aiOi+ r in Eq. (|67|l using definition 165|) 
of the memory function. 



0.08 
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0.04 



0.02 



0.00 




FIG. 4: The initial memory function Eq. I|720 (solid line) and the reconstructed one (dots) vs the distance r. In inset, the 
correlation function K(r) obtained by a numerical analysis of the sequence constructed by means of the memory function 
Eq. E3- 

The second method resulting from the first one, establishes a relationship between the memory function F(r) and 
the variance D(L), 



N 



M(r,0) = F(r')M(r,r') 



(71) 
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FIG. 5: The model correlation function K(r) described by Eq. 17311 (solid line). The dots correspond to the reconstructed 
correlation function. In inset, the memory function F(r) obtained by numerical solution of Eq. 1701 with correlation function 
Eq. E3). 



M(r, r') = D(r - r') - (D(-r') + r[D(-r' + 1) - D(-r')]). 



[D(r 



It is a set of linear equations for F(r) with coefficients M(r,r') determined by D{r). The relations, K(r) 
1) - 2D(r) + D(r + l)]/2 obtained in Ref. 9] and D(-r) = D(r) are used here. 

Let us verify the robustness of our method by numerical simulations. We consider a model "triangle" memory 
function, 



F{r) = 0.008 




1 < r < 10, 
10 < r < 20, 
r > 20, 



(72) 



presented in Fig. 0] by solid line. Using Eq. 1(65(1 . we construct a random non-biased, a = 1/2, sequence of symbols 
{0, 1}. Then, with the aid of the constructed binary sequence of the length 10 6 , we calculate numerically the correlation 
function K(r). The result of these calculations is presented in inset Fig. 01 One can see that the correlation function 
K(r) mimics roughly the memory function F(r) over the region 1 < r < 20. In the region r > 20, the memory function 
is equal to zero but the correlation function does not vanish [23|. Then, using the obtained correlation function K(r), 
we solve numerically Eq. i(70() . The result is shown in Fig.0]by dots. One can see a good agrement of initial, Eq. 1(72(1 . 
and reconstructed memory functions F(r). 



Numerical simulations 



The main and very nontrivial result of our paper consists in the ability to construct a binary sequence with an 
arbitrary prescribed correlation function by means of Eq. 1)70(1 . As an example, let us consider the model correlation 
function, 



K(r) = 0.1 



sin(r) 



(73) 



presented by the solid line in Fig. 03 We solve Eq. 1)70(1 numerically to find the memory function F(r) using this 
correlation function. The result is presented in inset Fig. GD Then we construct the binary Markov chain using the 
obtained memory function F(r). To check up a robustness of the method, we calculate the correlation function K(r) 
of the constructed chain (the dots in Fig. |5J) and compare it with Eq. J73J). One can see an excellent agreement 
between the initial and reconstructed correlation functions. 

Let us demonstrate the effectiveness of our concept of the additive Markov chains when investigating the correlation 
properties of coarse grained literary texts. First, we use the coarse-graining procedure and map the letters of the text 
of Bible [i^l onto the symbols zero and unity (here, (a — m) ^ 0, (n — z) i— > 1). Then we examine the correlation 
properties of the constructed sequence and calculate numerically the variance D(L). The result of simulation of the 
normalized variance D n {L) = D(L)/4a(l — a) is presented by the solid line in Fig. EI The dominator 4a(l — a) in the 
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FIG. 6: The normalized variance D„(L) for the coarse-grained text of Bible (solid line) and for the sequence generated by 
means of the reconstructed memory function F(r) (dots). The dotted straight line describes the non-biased non-correlated 
Brownian diffusion, Do(L) — L/4. The inset demonstrates the anti-persistent dependence of ratio D n (L) / Dq(L) on L at short 
distances. 



equation for the normalized variance D n {L) is inserted in order to take into account the inequality of the numbers 
of zeros and unities in the coarse-grained literary texts. The straight dotted line in this figure describes the variance 
Dq(L) — L/4, which corresponds to the non-biased non- correlated Brownian diffusion. The deviation of the solid 
line from the dotted one demonstrates the existence of correlations in the text. It is clearly seen that the diffusion is 
anti-persistent at small distances, L < 300, (see inset Fig. EJ) whereas it is persistent at long distances. 

The memory function F(r) for the coarse-grained text of Bible at r < 300 obtained by numerical solution of Eq. I|71|l 
is shown in Fig.0 At long distances, r > 300, the memory function can be nicely approximated by the power function 
F(r) = 0.25r -11 , which is presented by the dash-dotted line in inset Fig- EI 
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FIG. 7: The memory function F(r) for the coarse-grained text of Bible at short distances. In inset, the power-law decreasing 
portions of the F(r) plots for several texts. The dots correspond to "Pygmalion" by B. Shaw. The solid line corresponds to 
power-law fitting of this function. The dash dotted and dashed lines correspond to Bible in English and Russian, respectively. 

Note that the region r < 40 of negative anti-persistent memory function provides much longer distances L ~ 300 
of anti-persistent behavior of the variance D(L). 

Our study reveals the existence of two characteristic regions with different behavior of the memory function and, 
correspondingly, of persistent and anti-persistent portions in the D(L) dependence. This appears to be a prominent 
feature of all texts written in any language. The positive persistent portions of the memory functions are given in 
inset Fig. [7|for the coarse-grained English- and Russian- worded texts of Bible (dash-dotted and dashed lines, Refs. 
and |25| . correspondingly). Besides, for comparison, the memory function of the coarse-grained text of "Pygmalion" 
by B. Shaw [2(| is presented in the same inset (dots), the power-law fitting is shown by solid line. 
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It is interesting to note that the memory function of any text mimics the correlation function, as it was found for 
the model example Eq. (|73|l . This fact is confirmed by Fig. [H] where the correlation function of the coarse-grained 
text of Bible is shown. One can see that its behavior at both short and long scales is similar to the memory function 
presented in Fig. \7\ However, the exponents in the power-law approximations of K(r) and F(r) functions differ 
essentially. 
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FIG. 8: The correlation function K(r) for the coarse-grained text of Bible at short distances. In inset, the power-law decreasing 
portions of the K(r) plot for the same text. The solid line corresponds to power-law fitting of this function. 




CONCLUSION 



Thus, the simple, exactly solvable model of the uniform binary Y-step Markov chain is presented. The memory 
length Y, the parameter fi of the persistent correlations and the biased parameter v are three parameters in our theory. 
The correlation function )C(r) is usually employed as the input characteristics for the description of the correlated 
random systems. Yet, the function K.(r) describes not only the direct interconnection of the elements and a,i +r , but 
also takes into account their indirect interaction via other elements. Since our approach operates with the "original" 
parameters N, fi and z/, we believe that it allows us to reveal the intrinsic properties of the system which provide the 
correlations between the elements. 

We have demonstrated the efficiency of description of the symbolic sequences with long-range correlations in terms 
of the memory function. An equation connecting the memory and correlation functions of the system under study is 
obtained. This equation allows reconstructing a memory function using a correlation function of the system. Actually, 
the memory function appears to be a suitable informative "visiting card" of any symbolic stochastic process. The 
effectiveness and robustness of the proposed method is demonstrated by simple model examples. Memory functions 
for some concrete examples of the coarse-grained literary texts are constructed and their power-law behavior at long 
distances is revealed. Thus, we have shown the complexity of organization of the literary texts in contrast to a 
previously discussed simple power-law decrease of correlations 

If the memory length Y of the system under consideration is of order of the very system length then the Markov 
chain, modeling the system, could be non-stationary. In this case the proposed method does not allow to describe the 
system precisely, as distinct from the method proposed in j^, H3 ■ 



APPENDIX. MATRIX OF THE CONDITIONAL PROBABILITY 



In this Appendix, we prove the property of metrical transitivity of the TV-step Markov chains. 
It is possible to look at the Markov chain from the other point of view and consider it as a 1-step vector Markov 
chain. To this end, we introduce the Y-component vector-function Xi, 



Xi = {ai+Xi &1+2, —,o>i+n), I — —2, —1, 0, 1, 2, . 



(74) 
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The number of different sets of symbols (aj+i, ai+2, &i+Ar) is equal to Q — 2 N . We number the different states of 
the vector Xi by their binary representation, 

L>(aAr,aAr_i,...,ai) =a A r2° + aj V -i2 1 + ... + ai2 Ar - 1 , < D < 2 N - 1. (75) 

The matrix elements Mik of the probability matrix M, i.e. the probabilities of transition of the vector X = 
(ai, ei2, a/v) into the vector y = (a^, a 2 , a' N ) can be expressed via the function of conditional probability 
P(ai | T/v,i). The subscripts i and fe of the matrix Mifc are determined by the binary representations of the sequences 
(ai, fl2j Q-n) and (a' l5 a 2 , Gjv), correspondingly: « = 1 + D(cln, (in-i, a-i) and k = 1 + D(a' N , a' N _ 1 , o^). 
Every matrix row contains only two non-zero elements since the vector X\ can take up two values only namely 
(d2, as, ajv, 0) and (02, 03, ojv, 1). For k < Q/2, let us denote the probability of occurring of <jn+i = as 1 — Pk, 
where the index k is equal to k = 1 + D{cln, cln-i, cii) in the binary representation. 

For the index k being in the range from Q/2 + 1 to Q, we denote the probability of occurring of symbol cin+i = 
after the word a\, a 2 , ajy as P^. Then, 1 — Pk is the probability of occurring of the symbol unity. Taking into 
account that ajv = 0forfc<<3/2 and obvious relations, 

D(a N -i, ai, 0) = 2D{a Nl a N -i, — , ai), 



D(a N -i, ...,ai, 1) = 2D(a N ,a N -i, ...,oi) + 1, 
we get the transition probabilities matrix M : 



M 



V 



-Pi 


























1--P2 


^2 





























1 - PQ/2 


PQ/2 


-fQ/2+1 


fQ/2+1 
























1 - Pq/2+2 


Po/2+2 





























i-Pq 


Pq 



(76) 



Thus, to determine the vectors b of probability distribution of iV-words in the stationary Markov chain we need to 
solve the system of equations, 



Q 

k=l 



Q 

5> = 1 - 

fe=l 



(77) 



In other words, one needs to obtain the normalized eigenvector corresponding to the eigenvalue A = 1 of the matrix 
M'") of the order Q = 2 N . It is clear that if the vector b satisfies to the condition bM = b then for every integer k the 
condition bM k — b is also true, here M k is the power k of the matrix M. Let us consider the matrix M N and show 
that all matrix elements are positive. In this case, following the Markov theorem we can conclude that the matrix M 
determines uniquely the probability of the words distribution. 

Let us suggest that for any k < N the matrix M k satisfies to the next conditions: in the first row the elements Mu 
for i = 1, . . . , 2 k are positive, in the second row the positive elements are M 2 i with i = 2 k + 1, . . . 2 x 2 fe , ... in the 
2 N ~ k -th row — i = (2 N ~ k — l)2 fc , . . . , 2^. In the next rows this order is repeated. Let us demonstrate that if the 
matrix M k obeys to this rules, then it is true for the matrix M k+1 also. 

After multiplication of matrixes M k and M the elements of obtained matrix are defined by the expression: 



M t+1 (ul^M l (M)M(I,j). 



(78) 



Let us consider the first row of the matrix M k+1 — i = 1. In each column of the matrix M only two elements are 
non-zero. After multiplication of the first row of the matrix M k to some column of the matrix M the result is non-zero 
(positive) for j < 2 * 2 k only, because positive elements of the matrix M corresponds to the positive zone (i < 2 k ) 
of the first row of matrix M k only for this j. So the described rule remains for the first row of the matrix M K+1 . 
Similarly this fact can be proved for other rows. 

The matrix M 1 obeys to this rule, consequently, by induction, it is true for all M k . In according to this rule, if 
power k = N, then all elements of the matrix M N are positive. 
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Therefore, from the Markov theorem, there is the unique solution of the system bM N — b (or bM — b). This 
solution can be obtained by the method of successive approximations, 



b^ 1 =b k 1 M{ 3 ,i), k = 0,1,2,..., 



(79) 



if we start from the arbitrary initial distribution b®. In the limit k — » oo we get to the stationary distribution of the 
probability vector b. 

Taking into account the explicit form of the matrix M, the equation Ij77(l comes to the next equations: 



6j(l — Pi) + b i+ Q/ 2 Pi+Q/2 — &2i-l> 



(80) 



6,;P,: + b i+ Q/ 2 (l ~ Pt+Q/2) = b 2l 

For Q = 2 we get the well known result |2l) : 

' 1 - Pi Pa 



M = 



P 2 1 - Pi 



bi = 



Pi 



Pi + P 2 

And in the case Q = 4 we obtain the next result: 



P1+P2 



M 



/1-Pi Pi \ 

1-P 2 P 2 

P 3 I-P3 

\ P 4 1 - Pi J 



P 3 P4 



P 1 P 2 + 2P 1 P i + P 3 P 4 ' 



b 2 = b 3 = 



P1P4 



PiP 2 



P 1 P 2 + 2P 1 P 4 + P 3 P 4 ' 



P 4 P 2 + PaP 4 + P 3 P 4 
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